Parsing Complementizer Phrases In Machine Translation System

نویسنده

  • T. Suryakanthi
چکیده

Every language has a finite number of words and finite number of rules but infinite number of sentences. Sentences are not formed by the words alone but by structural units known as constituents. Analysis of the sentence constituency begins at the larger units of grammar and then breaks the larger units down into smaller and smaller units. Syntax is the study of relationships between words and how they are put together to construct phrases and sentences Sentences can also take embedded form that is each sentence consist another sentence with in it. Embedded phrases are complements like NP or PP complements except they are united by the complementizer phrase and may be introduced by a complementizer like subordinate conjunction or a relative pronoun. Paring is the process of identifying the structure of a sentence to know whether a sentence is well formed with respect to a language. It is a process to check whether a sentence can be derived from the given grammar of a language. The two basic approaches for parsing are top-down and bottom-up. This paper describes the process of parsing of complementizer phrases in a machine translation system. Keywords-component Complementizer Phrase (CP), Machine Translation (MT), Government and Binding (GB), Parts of Speech (POS)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

Discontinuous Verb Phrases in Parsing and Machine Translation of English and German

In this paper, we focus on the verb-particle (V-Prt) split construction in English and German and its difficulty for parsing and Machine Translation (MT). For German, we use an existing test suite of V-Prt split constructions, while for English, we build a new and comparable test suite from raw data. These two data sets are then used to perform an analysis of errors in dependency parsing, word-...

متن کامل

Practical Approach to Syntax-based Statistical Machine Translation

This paper presents a practical approach to statistical machine translation (SMT) based on syntactic transfer. Conventionally, phrase-based SMT generates an output sentence by combining phrase (multiword sequence) translation and phrase reordering without syntax. On the other hand, SMT based on tree-to-tree mapping, which involves syntactic information, is theoretical, so its features remain un...

متن کامل

مدل ترجمه عبارت-مرزی با استفاده از برچسب‌های کم‌عمق نحوی

Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...

متن کامل

Word-Order Issues in English-to-Urdu Statistical Machine Translation

We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012